Data Processing

The test set has 20 observations of 160 variables. Use sapply and a function using all and is.na to test data frame, pmlTest0 and find any column that contained only missing values. The following algorithm will select variables with data for every observation in the data set. Create a vector of missing value counts in the data frame, pmlTest0 and use it to create a vector of missing value count = 0, i.e. vector of pmlTest0 variables with no missing values, which is used to create the subset pmlTest1.

setwd("~/Dropbox/jhudatascience/8_Practical_Machine_Learning/CourseProject")
pmlTest0 <- read.csv("data/pml-testing.csv")
pmlTest0MissingCounts <- sapply(pmlTest0, function(x)sum(is.na(x)))
colNamesTest <- pmlTest0MissingCounts[pmlTest0MissingCounts==0]
pmlTest1 <- pmlTest0[,names(colNamesTest)]

The training set has 19,622 observations of 160 variables. Numeric variables are coming over as factors. There are missing values and Excel divide by zero errors. Use stringsAsFactors=FALSE and na.strings = c(“#DIV/0!”,“”,“NA”) to fix numeric variables and convert #DIV/0!, “”, and NA to NA.

setwd("~/Dropbox/jhudatascience/8_Practical_Machine_Learning/CourseProject")
pmlTrain0 <- read.csv("data/pml-training.csv")
str(pmlTrain0)
## 'data.frame':    19622 obs. of  160 variables:
##  $ X                       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ user_name               : Factor w/ 6 levels "adelmo","carlitos",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ raw_timestamp_part_1    : int  1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
##  $ raw_timestamp_part_2    : int  788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
##  $ cvtd_timestamp          : Factor w/ 20 levels "02/12/2011 13:32",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ new_window              : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ num_window              : int  11 11 11 12 12 12 12 12 12 12 ...
##  $ roll_belt               : num  1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
##  $ pitch_belt              : num  8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
##  $ yaw_belt                : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
##  $ total_accel_belt        : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ kurtosis_roll_belt      : Factor w/ 397 levels "","-0.016850",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ kurtosis_picth_belt     : Factor w/ 317 levels "","-0.021887",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ kurtosis_yaw_belt       : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_roll_belt      : Factor w/ 395 levels "","-0.003095",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_roll_belt.1    : Factor w/ 338 levels "","-0.005928",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_yaw_belt       : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
##  $ max_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_belt          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_belt            : Factor w/ 68 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ min_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_belt          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_belt            : Factor w/ 68 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ amplitude_roll_belt     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_pitch_belt    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_yaw_belt      : Factor w/ 4 levels "","#DIV/0!","0.00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ var_total_accel_belt    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_roll_belt        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_pitch_belt          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_pitch_belt       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_pitch_belt          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_yaw_belt         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gyros_belt_x            : num  0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
##  $ gyros_belt_y            : num  0 0 0 0 0.02 0 0 0 0 0 ...
##  $ gyros_belt_z            : num  -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
##  $ accel_belt_x            : int  -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
##  $ accel_belt_y            : int  4 4 5 3 2 4 3 4 2 4 ...
##  $ accel_belt_z            : int  22 22 23 21 24 21 21 21 24 22 ...
##  $ magnet_belt_x           : int  -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
##  $ magnet_belt_y           : int  599 608 600 604 600 603 599 603 602 609 ...
##  $ magnet_belt_z           : int  -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
##  $ roll_arm                : num  -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
##  $ pitch_arm               : num  22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
##  $ yaw_arm                 : num  -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
##  $ total_accel_arm         : int  34 34 34 34 34 34 34 34 34 34 ...
##  $ var_accel_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_roll_arm         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_pitch_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_yaw_arm             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_yaw_arm          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_yaw_arm             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gyros_arm_x             : num  0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
##  $ gyros_arm_y             : num  0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
##  $ gyros_arm_z             : num  -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
##  $ accel_arm_x             : int  -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
##  $ accel_arm_y             : int  109 110 110 111 111 111 111 111 109 110 ...
##  $ accel_arm_z             : int  -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
##  $ magnet_arm_x            : int  -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
##  $ magnet_arm_y            : int  337 337 344 344 337 342 336 338 341 334 ...
##  $ magnet_arm_z            : int  516 513 513 512 506 513 509 510 518 516 ...
##  $ kurtosis_roll_arm       : Factor w/ 330 levels "","-0.02438",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ kurtosis_picth_arm      : Factor w/ 328 levels "","-0.00484",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ kurtosis_yaw_arm        : Factor w/ 395 levels "","-0.01548",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_roll_arm       : Factor w/ 331 levels "","-0.00051",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_pitch_arm      : Factor w/ 328 levels "","-0.00184",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_yaw_arm        : Factor w/ 395 levels "","-0.00311",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ max_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_arm             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_arm             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_pitch_arm     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_yaw_arm       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ roll_dumbbell           : num  13.1 13.1 12.9 13.4 13.4 ...
##  $ pitch_dumbbell          : num  -70.5 -70.6 -70.3 -70.4 -70.4 ...
##  $ yaw_dumbbell            : num  -84.9 -84.7 -85.1 -84.9 -84.9 ...
##  $ kurtosis_roll_dumbbell  : Factor w/ 398 levels "","-0.0035","-0.0073",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ kurtosis_picth_dumbbell : Factor w/ 401 levels "","-0.0163","-0.0233",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ kurtosis_yaw_dumbbell   : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_roll_dumbbell  : Factor w/ 401 levels "","-0.0082","-0.0096",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_pitch_dumbbell : Factor w/ 402 levels "","-0.0053","-0.0084",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ skewness_yaw_dumbbell   : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
##  $ max_roll_dumbbell       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_dumbbell      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_dumbbell        : Factor w/ 73 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ min_roll_dumbbell       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_dumbbell      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_dumbbell        : Factor w/ 73 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ amplitude_roll_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##   [list output truncated]
sum(is.na(pmlTrain0)) # 1,287,472 the DIV/0! values are here
## [1] 1287472
pmlTrain1 <- read.csv("data/pml-training.csv", stringsAsFactors = FALSE, na.strings = c("#DIV/0!","","NA"))
sum(is.na(pmlTrain1)) # 1,925,102 the DIV/0! values are converted to NA properly
## [1] 1925102

Use sapply and a function using all and is.na to test data frame, pmlTrain1 and find any column that contained only missing values. This resulted in 6 columns: kurtosis_yaw_belt, skewness_yaw_belt, kurtosis_yaw_dumbbell, skewness_yaw_dumbbell, kurtosis_yaw_forearm, and skewness_yaw_forearm.

sapply(pmlTrain1, function(x)all(is.na(x))) #6 variables contain all NA
##                        X                user_name     raw_timestamp_part_1 
##                    FALSE                    FALSE                    FALSE 
##     raw_timestamp_part_2           cvtd_timestamp               new_window 
##                    FALSE                    FALSE                    FALSE 
##               num_window                roll_belt               pitch_belt 
##                    FALSE                    FALSE                    FALSE 
##                 yaw_belt         total_accel_belt       kurtosis_roll_belt 
##                    FALSE                    FALSE                    FALSE 
##      kurtosis_picth_belt        kurtosis_yaw_belt       skewness_roll_belt 
##                    FALSE                     TRUE                    FALSE 
##     skewness_roll_belt.1        skewness_yaw_belt            max_roll_belt 
##                    FALSE                     TRUE                    FALSE 
##           max_picth_belt             max_yaw_belt            min_roll_belt 
##                    FALSE                    FALSE                    FALSE 
##           min_pitch_belt             min_yaw_belt      amplitude_roll_belt 
##                    FALSE                    FALSE                    FALSE 
##     amplitude_pitch_belt       amplitude_yaw_belt     var_total_accel_belt 
##                    FALSE                    FALSE                    FALSE 
##            avg_roll_belt         stddev_roll_belt            var_roll_belt 
##                    FALSE                    FALSE                    FALSE 
##           avg_pitch_belt        stddev_pitch_belt           var_pitch_belt 
##                    FALSE                    FALSE                    FALSE 
##             avg_yaw_belt          stddev_yaw_belt             var_yaw_belt 
##                    FALSE                    FALSE                    FALSE 
##             gyros_belt_x             gyros_belt_y             gyros_belt_z 
##                    FALSE                    FALSE                    FALSE 
##             accel_belt_x             accel_belt_y             accel_belt_z 
##                    FALSE                    FALSE                    FALSE 
##            magnet_belt_x            magnet_belt_y            magnet_belt_z 
##                    FALSE                    FALSE                    FALSE 
##                 roll_arm                pitch_arm                  yaw_arm 
##                    FALSE                    FALSE                    FALSE 
##          total_accel_arm            var_accel_arm             avg_roll_arm 
##                    FALSE                    FALSE                    FALSE 
##          stddev_roll_arm             var_roll_arm            avg_pitch_arm 
##                    FALSE                    FALSE                    FALSE 
##         stddev_pitch_arm            var_pitch_arm              avg_yaw_arm 
##                    FALSE                    FALSE                    FALSE 
##           stddev_yaw_arm              var_yaw_arm              gyros_arm_x 
##                    FALSE                    FALSE                    FALSE 
##              gyros_arm_y              gyros_arm_z              accel_arm_x 
##                    FALSE                    FALSE                    FALSE 
##              accel_arm_y              accel_arm_z             magnet_arm_x 
##                    FALSE                    FALSE                    FALSE 
##             magnet_arm_y             magnet_arm_z        kurtosis_roll_arm 
##                    FALSE                    FALSE                    FALSE 
##       kurtosis_picth_arm         kurtosis_yaw_arm        skewness_roll_arm 
##                    FALSE                    FALSE                    FALSE 
##       skewness_pitch_arm         skewness_yaw_arm             max_roll_arm 
##                    FALSE                    FALSE                    FALSE 
##            max_picth_arm              max_yaw_arm             min_roll_arm 
##                    FALSE                    FALSE                    FALSE 
##            min_pitch_arm              min_yaw_arm       amplitude_roll_arm 
##                    FALSE                    FALSE                    FALSE 
##      amplitude_pitch_arm        amplitude_yaw_arm            roll_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##           pitch_dumbbell             yaw_dumbbell   kurtosis_roll_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##  kurtosis_picth_dumbbell    kurtosis_yaw_dumbbell   skewness_roll_dumbbell 
##                    FALSE                     TRUE                    FALSE 
##  skewness_pitch_dumbbell    skewness_yaw_dumbbell        max_roll_dumbbell 
##                    FALSE                     TRUE                    FALSE 
##       max_picth_dumbbell         max_yaw_dumbbell        min_roll_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##       min_pitch_dumbbell         min_yaw_dumbbell  amplitude_roll_dumbbell 
##                    FALSE                    FALSE                    FALSE 
## amplitude_pitch_dumbbell   amplitude_yaw_dumbbell     total_accel_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##       var_accel_dumbbell        avg_roll_dumbbell     stddev_roll_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##        var_roll_dumbbell       avg_pitch_dumbbell    stddev_pitch_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##       var_pitch_dumbbell         avg_yaw_dumbbell      stddev_yaw_dumbbell 
##                    FALSE                    FALSE                    FALSE 
##         var_yaw_dumbbell         gyros_dumbbell_x         gyros_dumbbell_y 
##                    FALSE                    FALSE                    FALSE 
##         gyros_dumbbell_z         accel_dumbbell_x         accel_dumbbell_y 
##                    FALSE                    FALSE                    FALSE 
##         accel_dumbbell_z        magnet_dumbbell_x        magnet_dumbbell_y 
##                    FALSE                    FALSE                    FALSE 
##        magnet_dumbbell_z             roll_forearm            pitch_forearm 
##                    FALSE                    FALSE                    FALSE 
##              yaw_forearm    kurtosis_roll_forearm   kurtosis_picth_forearm 
##                    FALSE                    FALSE                    FALSE 
##     kurtosis_yaw_forearm    skewness_roll_forearm   skewness_pitch_forearm 
##                     TRUE                    FALSE                    FALSE 
##     skewness_yaw_forearm         max_roll_forearm        max_picth_forearm 
##                     TRUE                    FALSE                    FALSE 
##          max_yaw_forearm         min_roll_forearm        min_pitch_forearm 
##                    FALSE                    FALSE                    FALSE 
##          min_yaw_forearm   amplitude_roll_forearm  amplitude_pitch_forearm 
##                    FALSE                    FALSE                    FALSE 
##    amplitude_yaw_forearm      total_accel_forearm        var_accel_forearm 
##                    FALSE                    FALSE                    FALSE 
##         avg_roll_forearm      stddev_roll_forearm         var_roll_forearm 
##                    FALSE                    FALSE                    FALSE 
##        avg_pitch_forearm     stddev_pitch_forearm        var_pitch_forearm 
##                    FALSE                    FALSE                    FALSE 
##          avg_yaw_forearm       stddev_yaw_forearm          var_yaw_forearm 
##                    FALSE                    FALSE                    FALSE 
##          gyros_forearm_x          gyros_forearm_y          gyros_forearm_z 
##                    FALSE                    FALSE                    FALSE 
##          accel_forearm_x          accel_forearm_y          accel_forearm_z 
##                    FALSE                    FALSE                    FALSE 
##         magnet_forearm_x         magnet_forearm_y         magnet_forearm_z 
##                    FALSE                    FALSE                    FALSE 
##                   classe 
##                    FALSE
sum(is.na(pmlTrain1[,"kurtosis_yaw_belt"])) # 19622 count to confirm
## [1] 19622
sum(is.na(pmlTrain1[,"skewness_yaw_belt"])) # 19622
## [1] 19622
sum(is.na(pmlTrain1[,"kurtosis_yaw_dumbbell"])) # 19622
## [1] 19622
sum(is.na(pmlTrain1[,"skewness_yaw_dumbbell"])) # 19622
## [1] 19622
sum(is.na(pmlTrain1[,"kurtosis_yaw_forearm"])) # 19622
## [1] 19622
sum(is.na(pmlTrain1[,"skewness_yaw_forearm"])) # 19622
## [1] 19622

The following algorithm will select variables with data for every observation in the data set. Create a vector of missing value counts in the data frame, pmlTrain1 and use it to create a vector of missing value count = 0, i.e. vector of pmlTrain1 variables with no missing values, which is used to create the subset pmlTrain2.

pmlTrain1MissingCounts <- sapply(pmlTrain1, function(x)sum(is.na(x)))
pmlTrain1Complete <- pmlTrain1MissingCounts[pmlTrain1MissingCounts==0]
pmlTrain2 <- pmlTrain1[,names(pmlTrain1Complete)]

This code verifies that the processed training and testing data sets are the same except for the problem_id variable in the testing set and the classe variable in the training set. We’ve reduced the 160 variables to 60 variables.

setequal(names(pmlTest1),names(pmlTrain2))
## [1] FALSE
setdiff(names(pmlTest1),names(pmlTrain2))
## [1] "problem_id"
setdiff(names(pmlTrain2),names(pmlTest1))
## [1] "classe"
intersect(names(pmlTest1),names(pmlTrain2))
##  [1] "X"                    "user_name"            "raw_timestamp_part_1"
##  [4] "raw_timestamp_part_2" "cvtd_timestamp"       "new_window"          
##  [7] "num_window"           "roll_belt"            "pitch_belt"          
## [10] "yaw_belt"             "total_accel_belt"     "gyros_belt_x"        
## [13] "gyros_belt_y"         "gyros_belt_z"         "accel_belt_x"        
## [16] "accel_belt_y"         "accel_belt_z"         "magnet_belt_x"       
## [19] "magnet_belt_y"        "magnet_belt_z"        "roll_arm"            
## [22] "pitch_arm"            "yaw_arm"              "total_accel_arm"     
## [25] "gyros_arm_x"          "gyros_arm_y"          "gyros_arm_z"         
## [28] "accel_arm_x"          "accel_arm_y"          "accel_arm_z"         
## [31] "magnet_arm_x"         "magnet_arm_y"         "magnet_arm_z"        
## [34] "roll_dumbbell"        "pitch_dumbbell"       "yaw_dumbbell"        
## [37] "total_accel_dumbbell" "gyros_dumbbell_x"     "gyros_dumbbell_y"    
## [40] "gyros_dumbbell_z"     "accel_dumbbell_x"     "accel_dumbbell_y"    
## [43] "accel_dumbbell_z"     "magnet_dumbbell_x"    "magnet_dumbbell_y"   
## [46] "magnet_dumbbell_z"    "roll_forearm"         "pitch_forearm"       
## [49] "yaw_forearm"          "total_accel_forearm"  "gyros_forearm_x"     
## [52] "gyros_forearm_y"      "gyros_forearm_z"      "accel_forearm_x"     
## [55] "accel_forearm_y"      "accel_forearm_z"      "magnet_forearm_x"    
## [58] "magnet_forearm_y"     "magnet_forearm_z"

Exploratory Data Analysis

1“X” (observation ID) 2“user_name” “carlitos” “pedro” “adelmo” “charles” “eurico” “jeremy” 3“raw_timestamp_part_1” 1322832937 4“raw_timestamp_part_2” 204334 5“cvtd_timestamp” 02/12/2011 13:35 sort(unique(pmlTrain2$cvtd_timestamp)) “02/12/2011 13:32” “02/12/2011 13:33” “02/12/2011 13:34” “02/12/2011 13:35” “02/12/2011 14:56” “02/12/2011 14:57” “02/12/2011 14:58” “02/12/2011 14:59” “05/12/2011 11:23” “05/12/2011 11:24” “05/12/2011 11:25” “05/12/2011 14:22” “05/12/2011 14:23” “05/12/2011 14:24” “28/11/2011 14:13” “28/11/2011 14:14” “28/11/2011 14:15” “30/11/2011 17:10” “30/11/2011 17:11” “30/11/2011 17:12” -4 days (Dec 2 and 5 of 2011 Nov 28 and 30 of 2011) -6 attempts Dec 2 and 5 of 2011 13:32 to 13:35, 14:56 to 14:59, 11:23 to 11:25, 14:22 to 14:24 Nov 28 and 30 of 2011 14:13 to 14:15, 17:10 to 17:12

6/new_window" no yes 7“num_window” 1-864
A sliding window approach was used for feature extraction. Different lengths from 0.5 sec to 2.5 sec, with 0.5 sec overlap Resulting in total 96 derived feature sets.

Sliding Windows are an approach to the sequential supervised learning problem. mlsd-ssspr.pdf

There are 13 variables for belt, arm, dumbbell, and forearm. 13x4=52. names(pmlTrain2[8:20]) names(pmlTrain2[21:33]) names(pmlTrain2[34:46]) names(pmlTrain2[47:59])

“roll_belt”
“pitch_belt” “yaw_belt” “total_accel_belt” “gyros_belt_x”
“gyros_belt_y” “gyros_belt_z” “accel_belt_x” “accel_belt_y”
“accel_belt_z” “magnet_belt_x” “magnet_belt_y” “magnet_belt_z”

“roll_arm” “pitch_arm” “yaw_arm” “total_accel_arm”
“gyros_arm_x” “gyros_arm_y” “gyros_arm_z” “accel_arm_x”
“accel_arm_y” “accel_arm_z” “magnet_arm_x” “magnet_arm_y”
“magnet_arm_z”

“roll_dumbbell” “pitch_dumbbell” “yaw_dumbbell”
“total_accel_dumbbell” “gyros_dumbbell_x” “gyros_dumbbell_y” “gyros_dumbbell_z”
“accel_dumbbell_x” “accel_dumbbell_y” “accel_dumbbell_z” “magnet_dumbbell_x”
“magnet_dumbbell_y” “magnet_dumbbell_z”

“roll_forearm” “pitch_forearm”
“yaw_forearm” “total_accel_forearm” “gyros_forearm_x” “gyros_forearm_y”
“gyros_forearm_z” “accel_forearm_x” “accel_forearm_y” “accel_forearm_z”
“magnet_forearm_x” “magnet_forearm_y” “magnet_forearm_z”

60 “classe” or “problem_id”

Data Splitting

library(caret);
inTrain <- createDataPartition(y=pmlTrain2$classe,
                               p=0.75, list=FALSE)
training <- pmlTrain2[inTrain,]
testing <- pmlTrain2[-inTrain,]
dim(training)
## [1] 14718    60

data exploration plotting predictors and looking for patterns, imbalance, outliers, unexplained groups, skewed variables -summary of predictors min, max, mean, median, quartiles -build training and testing set and explore training only and leave testing as hold out set -featurePlot show outcome and predictors scatterplots -qplot scatterplot of outcome and predictor (add color by variables; add regression smoothers to see trends) -make numbers into categories or factors with cut2 plot with plot (add boxplots and overlay points) -tables on factors and predictor showing counts or proportions prop.table -density plots for continuous predictors to see where the bulk of the data is color by belt, arm, dumbbell, and forearm?

table of classe in training

A B C D E 4185 2848 2567 2412 2706

summary(as.factor(training$classe))
##    A    B    C    D    E 
## 4185 2848 2567 2412 2706

density plot belt predictors by classe

qplot(roll_belt,colour=classe,data=training,geom="density")

qplot(pitch_belt,colour=classe,data=training,geom="density")

qplot(yaw_belt,colour=classe,data=training,geom="density")

qplot(total_accel_belt,colour=classe,data=training,geom="density")

qplot(gyros_belt_x,colour=classe,data=training,geom="density")

qplot(gyros_belt_y,colour=classe,data=training,geom="density")

qplot(gyros_belt_z,colour=classe,data=training,geom="density")

qplot(accel_belt_x,colour=classe,data=training,geom="density")

qplot(accel_belt_y,colour=classe,data=training,geom="density")

qplot(accel_belt_z,colour=classe,data=training,geom="density")

qplot(magnet_belt_x,colour=classe,data=training,geom="density")

qplot(magnet_belt_y,colour=classe,data=training,geom="density")

qplot(magnet_belt_z,colour=classe,data=training,geom="density")

density plot arm predictors by classe

qplot(roll_arm,colour=classe,data=training,geom="density")

qplot(pitch_arm,colour=classe,data=training,geom="density")

qplot(yaw_arm,colour=classe,data=training,geom="density")

qplot(total_accel_arm,colour=classe,data=training,geom="density")

qplot(gyros_arm_x,colour=classe,data=training,geom="density")

qplot(gyros_arm_y,colour=classe,data=training,geom="density")

qplot(gyros_arm_z,colour=classe,data=training,geom="density")

qplot(accel_arm_x,colour=classe,data=training,geom="density")

qplot(accel_arm_y,colour=classe,data=training,geom="density")

qplot(accel_arm_z,colour=classe,data=training,geom="density")

qplot(magnet_arm_x,colour=classe,data=training,geom="density")

qplot(magnet_arm_y,colour=classe,data=training,geom="density")

qplot(magnet_arm_z,colour=classe,data=training,geom="density")

density plot dumbbell predictors by classe

qplot(roll_dumbbell,colour=classe,data=training,geom="density")

qplot(pitch_dumbbell,colour=classe,data=training,geom="density")

qplot(yaw_dumbbell,colour=classe,data=training,geom="density")

qplot(total_accel_dumbbell,colour=classe,data=training,geom="density")

qplot(gyros_dumbbell_x,colour=classe,data=training,geom="density")

qplot(gyros_dumbbell_y,colour=classe,data=training,geom="density")

qplot(gyros_dumbbell_z,colour=classe,data=training,geom="density")

qplot(accel_dumbbell_x,colour=classe,data=training,geom="density")

qplot(accel_dumbbell_y,colour=classe,data=training,geom="density")

qplot(accel_dumbbell_z,colour=classe,data=training,geom="density")

qplot(magnet_dumbbell_x,colour=classe,data=training,geom="density")

qplot(magnet_dumbbell_y,colour=classe,data=training,geom="density")

qplot(magnet_dumbbell_z,colour=classe,data=training,geom="density")

density plot forearm predictors by classe

qplot(roll_forearm,colour=classe,data=training,geom="density")

qplot(pitch_forearm,colour=classe,data=training,geom="density")

qplot(yaw_forearm,colour=classe,data=training,geom="density")

qplot(total_accel_forearm,colour=classe,data=training,geom="density")

qplot(gyros_forearm_x,colour=classe,data=training,geom="density")

qplot(gyros_forearm_y,colour=classe,data=training,geom="density")

qplot(gyros_forearm_z,colour=classe,data=training,geom="density")

qplot(accel_forearm_x,colour=classe,data=training,geom="density")

qplot(accel_forearm_y,colour=classe,data=training,geom="density")

qplot(accel_forearm_z,colour=classe,data=training,geom="density")

qplot(magnet_forearm_x,colour=classe,data=training,geom="density")

qplot(magnet_forearm_y,colour=classe,data=training,geom="density")

qplot(magnet_forearm_z,colour=classe,data=training,geom="density")

histgrams - belt

with(training, hist(roll_belt))

with(training, hist(pitch_belt))

with(training, hist(yaw_belt))

with(training, hist(total_accel_belt))

with(training, hist(gyros_belt_x))

with(training, hist(gyros_belt_y))

with(training, hist(gyros_belt_z))

with(training, hist(accel_belt_x))

with(training, hist(accel_belt_y))

with(training, hist(accel_belt_z))

with(training, hist(magnet_belt_x))

with(training, hist(magnet_belt_y))

with(training, hist(magnet_belt_z))

histgrams - arm

with(training, hist(roll_arm))

with(training, hist(pitch_arm))

with(training, hist(yaw_arm))

with(training, hist(total_accel_arm))

with(training, hist(gyros_arm_x))

with(training, hist(gyros_arm_y))

with(training, hist(gyros_arm_z))

with(training, hist(accel_arm_x))

with(training, hist(accel_arm_y))

with(training, hist(accel_arm_z))

with(training, hist(magnet_arm_x))

with(training, hist(magnet_arm_y))

with(training, hist(magnet_arm_z))

histgrams - dumbbell

with(training, hist(roll_dumbbell))

with(training, hist(pitch_dumbbell))

with(training, hist(yaw_dumbbell))

with(training, hist(total_accel_dumbbell))

with(training, hist(gyros_dumbbell_x))

with(training, hist(gyros_dumbbell_y))

with(training, hist(gyros_dumbbell_z))

with(training, hist(accel_dumbbell_x))

with(training, hist(accel_dumbbell_y))

with(training, hist(accel_dumbbell_z))

with(training, hist(magnet_dumbbell_x))

with(training, hist(magnet_dumbbell_y))

with(training, hist(magnet_dumbbell_z))

histgrams - forearm

with(training, hist(roll_forearm))

with(training, hist(pitch_forearm))

with(training, hist(yaw_forearm))

with(training, hist(total_accel_forearm))

with(training, hist(gyros_forearm_x))

with(training, hist(gyros_forearm_y))

with(training, hist(gyros_forearm_z))

with(training, hist(accel_forearm_x))

with(training, hist(accel_forearm_y))

with(training, hist(accel_forearm_z))

with(training, hist(magnet_forearm_x))

with(training, hist(magnet_forearm_y))

with(training, hist(magnet_forearm_z))

summary - belt, arm, dumbbell, forearm

summary(training[8:20])
##    roll_belt        pitch_belt          yaw_belt       total_accel_belt
##  Min.   :-28.90   Min.   :-54.9000   Min.   :-179.00   Min.   : 0.00   
##  1st Qu.:  1.10   1st Qu.:  1.7825   1st Qu.: -88.30   1st Qu.: 3.00   
##  Median :113.00   Median :  5.2900   Median : -13.20   Median :17.00   
##  Mean   : 64.28   Mean   :  0.2268   Mean   : -11.18   Mean   :11.29   
##  3rd Qu.:123.00   3rd Qu.: 14.8000   3rd Qu.:  13.07   3rd Qu.:18.00   
##  Max.   :162.00   Max.   : 60.3000   Max.   : 179.00   Max.   :29.00   
##   gyros_belt_x       gyros_belt_y       gyros_belt_z    
##  Min.   :-1.04000   Min.   :-0.64000   Min.   :-1.4600  
##  1st Qu.:-0.03000   1st Qu.: 0.00000   1st Qu.:-0.2000  
##  Median : 0.03000   Median : 0.02000   Median :-0.1000  
##  Mean   :-0.00414   Mean   : 0.03996   Mean   :-0.1287  
##  3rd Qu.: 0.11000   3rd Qu.: 0.11000   3rd Qu.: 0.0000  
##  Max.   : 2.22000   Max.   : 0.64000   Max.   : 1.6200  
##   accel_belt_x       accel_belt_y     accel_belt_z    magnet_belt_x   
##  Min.   :-120.000   Min.   :-69.00   Min.   :-275.0   Min.   :-52.00  
##  1st Qu.: -21.000   1st Qu.:  3.00   1st Qu.:-162.0   1st Qu.:  9.00  
##  Median : -15.000   Median : 34.00   Median :-151.0   Median : 35.00  
##  Mean   :  -5.468   Mean   : 30.06   Mean   : -72.4   Mean   : 55.86  
##  3rd Qu.:  -5.000   3rd Qu.: 61.00   3rd Qu.:  27.0   3rd Qu.: 60.00  
##  Max.   :  85.000   Max.   :164.00   Max.   : 105.0   Max.   :481.00  
##  magnet_belt_y   magnet_belt_z   
##  Min.   :354.0   Min.   :-621.0  
##  1st Qu.:582.0   1st Qu.:-375.0  
##  Median :601.0   Median :-319.0  
##  Mean   :593.6   Mean   :-345.3  
##  3rd Qu.:610.0   3rd Qu.:-306.0  
##  Max.   :673.0   Max.   : 293.0
summary(training[21:33])
##     roll_arm         pitch_arm          yaw_arm          total_accel_arm
##  Min.   :-180.00   Min.   :-88.200   Min.   :-180.0000   Min.   : 1.00  
##  1st Qu.: -31.80   1st Qu.:-26.100   1st Qu.: -42.7000   1st Qu.:17.00  
##  Median :   0.00   Median :  0.000   Median :   0.0000   Median :27.00  
##  Mean   :  17.37   Mean   : -4.635   Mean   :  -0.7672   Mean   :25.49  
##  3rd Qu.:  76.80   3rd Qu.: 11.100   3rd Qu.:  45.5000   3rd Qu.:33.00  
##  Max.   : 180.00   Max.   : 88.500   Max.   : 180.0000   Max.   :66.00  
##   gyros_arm_x       gyros_arm_y       gyros_arm_z       accel_arm_x     
##  Min.   :-6.3700   Min.   :-3.4400   Min.   :-2.3300   Min.   :-404.00  
##  1st Qu.:-1.3600   1st Qu.:-0.8000   1st Qu.:-0.0700   1st Qu.:-241.00  
##  Median : 0.0600   Median :-0.2400   Median : 0.2300   Median : -43.00  
##  Mean   : 0.0323   Mean   :-0.2556   Mean   : 0.2707   Mean   : -59.45  
##  3rd Qu.: 1.5700   3rd Qu.: 0.1600   3rd Qu.: 0.7200   3rd Qu.:  84.00  
##  Max.   : 4.8700   Max.   : 2.8100   Max.   : 3.0200   Max.   : 435.00  
##   accel_arm_y       accel_arm_z       magnet_arm_x     magnet_arm_y   
##  Min.   :-318.00   Min.   :-636.00   Min.   :-584.0   Min.   :-392.0  
##  1st Qu.: -55.00   1st Qu.:-144.00   1st Qu.:-296.0   1st Qu.: -12.0  
##  Median :  13.00   Median : -48.00   Median : 297.0   Median : 199.0  
##  Mean   :  32.31   Mean   : -72.03   Mean   : 196.3   Mean   : 155.1  
##  3rd Qu.: 139.75   3rd Qu.:  22.00   3rd Qu.: 641.0   3rd Qu.: 322.0  
##  Max.   : 308.00   Max.   : 271.00   Max.   : 782.0   Max.   : 583.0  
##   magnet_arm_z   
##  Min.   :-597.0  
##  1st Qu.: 122.0  
##  Median : 441.0  
##  Mean   : 304.2  
##  3rd Qu.: 544.0  
##  Max.   : 690.0
summary(training[34:46])
##  roll_dumbbell     pitch_dumbbell     yaw_dumbbell     
##  Min.   :-153.71   Min.   :-137.34   Min.   :-150.871  
##  1st Qu.: -17.04   1st Qu.: -40.79   1st Qu.: -77.742  
##  Median :  48.46   Median : -21.03   Median :  -5.092  
##  Mean   :  24.20   Mean   : -10.72   Mean   :   1.065  
##  3rd Qu.:  67.61   3rd Qu.:  17.63   3rd Qu.:  78.540  
##  Max.   : 153.38   Max.   : 137.03   Max.   : 154.952  
##  total_accel_dumbbell gyros_dumbbell_x   gyros_dumbbell_y  
##  Min.   : 0.00        Min.   :-204.000   Min.   :-2.10000  
##  1st Qu.: 4.00        1st Qu.:  -0.030   1st Qu.:-0.14000  
##  Median :10.00        Median :   0.130   Median : 0.03000  
##  Mean   :13.75        Mean   :   0.157   Mean   : 0.04621  
##  3rd Qu.:20.00        3rd Qu.:   0.350   3rd Qu.: 0.21000  
##  Max.   :58.00        Max.   :   2.220   Max.   :52.00000  
##  gyros_dumbbell_z   accel_dumbbell_x  accel_dumbbell_y  accel_dumbbell_z 
##  Min.   : -2.3800   Min.   :-419.00   Min.   :-182.00   Min.   :-334.00  
##  1st Qu.: -0.3100   1st Qu.: -51.00   1st Qu.:  -8.00   1st Qu.:-142.00  
##  Median : -0.1300   Median :  -8.00   Median :  43.00   Median :  -2.00  
##  Mean   : -0.1242   Mean   : -28.85   Mean   :  53.05   Mean   : -38.95  
##  3rd Qu.:  0.0300   3rd Qu.:  11.00   3rd Qu.: 111.00   3rd Qu.:  37.00  
##  Max.   :317.0000   Max.   : 235.00   Max.   : 315.00   Max.   : 318.00  
##  magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
##  Min.   :-643.0    Min.   :-3600     Min.   :-250.00  
##  1st Qu.:-536.0    1st Qu.:  231     1st Qu.: -45.00  
##  Median :-480.0    Median :  311     Median :  14.00  
##  Mean   :-330.8    Mean   :  223     Mean   :  45.57  
##  3rd Qu.:-308.0    3rd Qu.:  391     3rd Qu.:  94.00  
##  Max.   : 592.0    Max.   :  633     Max.   : 451.00
summary(training[47:59])
##   roll_forearm       pitch_forearm     yaw_forearm     total_accel_forearm
##  Min.   :-180.0000   Min.   :-72.40   Min.   :-180.0   Min.   :  0.00     
##  1st Qu.:  -0.7075   1st Qu.:  0.00   1st Qu.: -68.2   1st Qu.: 29.00     
##  Median :  21.1000   Median :  8.95   Median :   0.0   Median : 36.00     
##  Mean   :  33.5942   Mean   : 10.62   Mean   :  19.2   Mean   : 34.75     
##  3rd Qu.: 140.0000   3rd Qu.: 28.40   3rd Qu.: 110.0   3rd Qu.: 41.00     
##  Max.   : 180.0000   Max.   : 89.80   Max.   : 180.0   Max.   :108.00     
##  gyros_forearm_x    gyros_forearm_y     gyros_forearm_z   
##  Min.   :-22.0000   Min.   : -7.02000   Min.   : -8.0900  
##  1st Qu.: -0.2100   1st Qu.: -1.48000   1st Qu.: -0.1800  
##  Median :  0.0500   Median :  0.03000   Median :  0.0800  
##  Mean   :  0.1605   Mean   :  0.07943   Mean   :  0.1564  
##  3rd Qu.:  0.5600   3rd Qu.:  1.62000   3rd Qu.:  0.4900  
##  Max.   :  3.9700   Max.   :311.00000   Max.   :231.0000  
##  accel_forearm_x   accel_forearm_y  accel_forearm_z   magnet_forearm_x 
##  Min.   :-496.00   Min.   :-632.0   Min.   :-446.00   Min.   :-1280.0  
##  1st Qu.:-180.00   1st Qu.:  55.0   1st Qu.:-182.00   1st Qu.: -617.0  
##  Median : -57.00   Median : 201.0   Median : -41.00   Median : -377.0  
##  Mean   : -62.06   Mean   : 163.6   Mean   : -56.08   Mean   : -312.9  
##  3rd Qu.:  77.00   3rd Qu.: 312.0   3rd Qu.:  25.00   3rd Qu.:  -72.0  
##  Max.   : 477.00   Max.   : 923.0   Max.   : 291.00   Max.   :  672.0  
##  magnet_forearm_y magnet_forearm_z
##  Min.   :-896     Min.   :-973.0  
##  1st Qu.:   5     1st Qu.: 199.0  
##  Median : 589     Median : 511.0  
##  Mean   : 379     Mean   : 395.6  
##  3rd Qu.: 736     3rd Qu.: 653.0  
##  Max.   :1480     Max.   :1090.0

caret featurePlots - belt

featurePlot(x=training[,c("roll_belt","pitch_belt","yaw_belt")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("total_accel_belt","accel_belt_x","accel_belt_y","accel_belt_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("gyros_belt_x","gyros_belt_y","gyros_belt_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("magnet_belt_x","magnet_belt_y","magnet_belt_z")],
            y=training$classe,
            plot="pairs")

caret featurePlots - arm

featurePlot(x=training[,c("roll_arm","pitch_arm","yaw_arm")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("total_accel_arm","accel_arm_x","accel_arm_y","accel_arm_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("gyros_arm_x","gyros_arm_y","gyros_arm_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("magnet_arm_x","magnet_arm_y","magnet_arm_z")],
            y=training$classe,
            plot="pairs")

caret featurePlots - dumbbell

featurePlot(x=training[,c("roll_dumbbell","pitch_dumbbell","yaw_dumbbell")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("total_accel_dumbbell","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z")],
            y=training$classe,
            plot="pairs")

caret featurePlots - forearm

featurePlot(x=training[,c("roll_forearm","pitch_forearm","yaw_forearm")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("total_accel_forearm","accel_forearm_x","accel_forearm_y","accel_forearm_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("gyros_forearm_x","gyros_forearm_y","gyros_forearm_z")],
            y=training$classe,
            plot="pairs")

featurePlot(x=training[,c("magnet_forearm_x","magnet_forearm_y","magnet_forearm_z")],
            y=training$classe,
            plot="pairs")

Here are the column numbers for each of the 4 sensor locations

names(training[8:20]) # belt
##  [1] "roll_belt"        "pitch_belt"       "yaw_belt"        
##  [4] "total_accel_belt" "gyros_belt_x"     "gyros_belt_y"    
##  [7] "gyros_belt_z"     "accel_belt_x"     "accel_belt_y"    
## [10] "accel_belt_z"     "magnet_belt_x"    "magnet_belt_y"   
## [13] "magnet_belt_z"
names(training[21:33]) # arm
##  [1] "roll_arm"        "pitch_arm"       "yaw_arm"        
##  [4] "total_accel_arm" "gyros_arm_x"     "gyros_arm_y"    
##  [7] "gyros_arm_z"     "accel_arm_x"     "accel_arm_y"    
## [10] "accel_arm_z"     "magnet_arm_x"    "magnet_arm_y"   
## [13] "magnet_arm_z"
names(training[34:46]) # dumbbell
##  [1] "roll_dumbbell"        "pitch_dumbbell"       "yaw_dumbbell"        
##  [4] "total_accel_dumbbell" "gyros_dumbbell_x"     "gyros_dumbbell_y"    
##  [7] "gyros_dumbbell_z"     "accel_dumbbell_x"     "accel_dumbbell_y"    
## [10] "accel_dumbbell_z"     "magnet_dumbbell_x"    "magnet_dumbbell_y"   
## [13] "magnet_dumbbell_z"
names(training[47:59]) # forearm
##  [1] "roll_forearm"        "pitch_forearm"       "yaw_forearm"        
##  [4] "total_accel_forearm" "gyros_forearm_x"     "gyros_forearm_y"    
##  [7] "gyros_forearm_z"     "accel_forearm_x"     "accel_forearm_y"    
## [10] "accel_forearm_z"     "magnet_forearm_x"    "magnet_forearm_y"   
## [13] "magnet_forearm_z"

Train options

train parameters method data preProcess weights metric cat acc cont rmse

train control method (resampling) boot boot632 cv repeatedcv LOOCV number (iterations) repeats p size of training initialWindow for time points horizon for time points savePredictions summaryFunction preProcOptions predictionBounds seeds allowParallel

Boosting

Model Fit

# library(caret)
# library(doMC)
# registerDoMC(4)
# this took 6 hours!
# Sys.time()
# grid <- expand.grid(mtry=100)
# fitControl <- trainControl(## 10-fold CV
                           # method = "repeatedcv",
                           # number = 10,
                           # repeated ten times
                           # repeats = 10)

# modFit <- train(x=pmlTrain2[,8:59],y=pmlTrain2[,60],
                # method="rf",
                # prox=TRUE,
                # tuneGrid = grid,
                # trControl = fitControl)
# Sys.time()
# modFit